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0.  INTRODUCTION 

f  . 

The  notion  of  behavioral  equivalence  is  a  fundamental  part  of  the  study 
of  automata  theory.  Two  definitions  of  behavioral  equivalence  occur  in  the 
literature  for  deterministic  machines.  One,  due  to  Burks  [5],  calls  two  ma¬ 
chines  behaviorally  equivalent  if  they  define  the  same  function  from  input 
strings  to  output  strings.  The  other,  part  of  Rabin-Scott  automata  theory, 
calls  two  machines  behaviorally  equivalent  if  they  accept  the  same  set  of 
tapes.  The  two  definitions  can  be  shown  to  be  the  same  for  deterministic  ma¬ 
chines  by  recoding  arbitrary  output  symbols  into  strings  of  zeros  and  ones. 
Both  definitions  have  been  generalized  for  probabilistic  machines.  However, 
for  probabilistic  machines  the  resulting  generalizations  are  not  equivalent. 

This  paper  is  concerned  with  certain  kinds  of  equivalences  between  prob¬ 
abilistic  machines.  Two  models  will  be  discussed  later  in  this  section  in 
order  to  gain  insight  into  the  main  kinds  of  equivalences  which  will  be 
studied.  Of  particular  interest  will  be  when  a  probabilistic  sequentlsd.  ma¬ 
chine  is  equivalent  in  some  sense  to  a  finite  deterministic  machine. 

0.1  THE  CONCEPT  OF  PROBABILISTIC  SEQUENTIAL  MACHINE 

By  a  probabilistic  sequentied  meuihiae  is  meant  a  system  which  satisfies 
one  of  the  following  two  definitions: 

Definition  0.1;  A  (Moore-type)  probabilistic  sequential  machine 
A  is  a  system  A  «  <  n,  I,  S,  E,  A(0) , , , .  ,A(k-l) ,  F,  0  > 
where 
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n:  a  natural  number,  the  number  of  states 

I;  a  n-dimensional  stochastic  vector,  the  initial  state  vector 
S:  set  of  state  vectors  =  {Si  =  ( 1,0, . . . ,0) , . . ,Sn  =  (0,...,0,l)) 

Z:  alphabet  set  Z  =  {0,1,2, ... ,k-l} 

A(i):  i  =  0,1,..., k-1  n  x  n  switching  matrix  for  input  symbol  i.  A(i)^jjj 
is  the  probability  of  a  transition  from  state  Z  to  state  m  via 
symbol  i. 

P:  output  vector,  a  n-dlmenslonal  column  vector  whose  entries  are 
real  numbers . 

0:  output  function  O(S^)  =  x  F  =  S^^  €  S 

Definition  0.2:  A  (Mealy-type)  probabilistic  sequential  machine. 

A  =  <  n,  I,  S,  Z,  A(0)  ,. . .  ,A(k-l)  ,  W,  P  > 

where  n,  I,  S,  Z,  A(0) ,. . . ,A(k-l)  are  as  in  0.1  and  where  the  output  function 
P  satisfies 

P(Si>J)  ~  Si  €  S,  J  €  Z 

It  is  an  easy  matter  to  show  that  Definition  0.1  and  0.2  are  equivalent 
in  the  following  sense:  For  every  Moore-type  probabilistic  sequential  ma¬ 
chine  there  is  a  Mealy-type  sequential  machine  whose  output  is  the  same  ran¬ 
dom  variable  over  each  input  and  vice-versa.  Consequently,  we  will  be  con¬ 
cerned  only  with  the  properties  of  Moore-type  probabilistic  sequential  ma¬ 
chines,  which  from  now  on  will  be  called  "sequential  machines." 

There  seem  to  be  many  instemces  of  systems  like  probabilistic  sequential 
machines  from  other  fields  of  study  not  generally  thought  to  be  automata  the¬ 
ory.  Braines  emd  Svechinsky  discuss  a  system  like  Definition  0.1  in  their 
paper  "Matrix  Structure  in  Simulation  of  Learning"  [l].  If  one  takes  the 
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cartesian  product  of  machines  of  Definition  0.2,  one  gets  the  Markov  processes 
with  rewards  and  alternatives  as  studied  in  sequential  decision  theory  as  pre¬ 
sented  by  Howard  [2].  Matrix  games  as  discussed  by  Thrall  [5]  can  be  con¬ 
sidered  as  instances  of  Definition  0.1  in  which  I  and  F  are  strategy  vectors 
and  game  matrix  A(x)  is  defined  by  a  string  x.  A  simple  correspondence  shows 
that  the  noisy  discrete  channel  of  Shannon  [8]  is  equivalent  to  the  system  of 
Definition  0.2.  One  would  hope  that  someday  probabilistic  sequential  machines 
could  become  a  unifying  concept,  organizing  and  providing  results  for  these 
diverse  fields. 

Probabilistic  sequential  machines  are  generalizations  of  the  work  of 
Rabin  [4]  for  probabilistic  automata.  If  one  restricts  I  to  elements  of  S  and 
Fi  =  0  or  1  for  i  =  l,2,...,n  then  Definition  0.1  defines  probabilistic  autom¬ 
ata.  Following  Rabin,  we  remark  that: 

Remark  1:  Let  x  =  ii...ij.,ij  €  E,  J  =  l,...,r, 

Then  A(x)  =  A( ii) . . .A( ir)  l.e.  the  switching  matrix  for  a  string  x  is  found 
by  multiplying  the  matrices  for  the  symbols  of  x  together  in  order. 

0.2  MDDEIS  OF  PROBABILISTIC  SEQUENTIAL  MACHINES 

We  consider  here  two  models,  one  of  which  can  be  considered  probabilis¬ 
tic  and  one  of  which  can  be  considered  deterministic,  euLthough  both  fall  with¬ 
in  the  framework  of  probabilistic  sequential  machines. 

Example  0.1.  Probabilistic  Internal  operation:  A  slot-machine 

A  simple  model  of  a  probabilistic  sequential  m6u:hlne  is  a  slot-machine. 
The  static  position  of  the  dleds  represents  the  present  state  of  the  machine. 


3 


Usuetlly  there  are  20  different  positions  on  the  dial  and  3  dials  for  a  total 
of  8,000  states.  The  input  consists  of  putting  in  a  coin  and  pulling  a  lever, 
causing  the  machine  to  travel  transiently  through  many  states  until  it  settles 
down  in  one  state.  An  output  is  associated  with  eax:h  state.  Nothing  (which 
is  associated  with  o)  comes  out  unless  the  dials  all  display  the  same  object. 

In  that  case,  some  change  tumbles  out  (which  is  associated  with  the  correspond¬ 
ing  real  number)  usually  dependent  only  on  the  kind  of  object  being  displayed, 
i,e.,  the  state  of  the  machine.  Such  a  machine  whose  output  is  controlled  by 
its  states  is  known  as  a  "Moore  machine"  [7],  Each  state  can  be  associated 
with  a  number  between  1  and  8,000,  and  the  output  for  each  state  can  be  tab¬ 
ulated  in  a  coliamn  vector  or  8,000  x  1  matrix.  In  the  formalism,  this  column 
vector  will  be  called  the  "output  vector"  and  designated  by  the  symbol  "F", 

The  output  for  state  i  will  be  written  as 

The  enormous  number  of  distinct  ways  the  lever  can  be  pulled  are  pre¬ 
vented  from  significantly  influencing  the  outcome  by  spring  loading.  Hence 
for  all  practical  purposes  there  is  only  one  kind  of  transition  law  associ¬ 
ated  with  pulling  the  lever.  If  the  randomness  of  transition  of  the  dials 
caused  by  variable  factors  like  dust  friction,  humidity,  heating  and  smell 
vibrations  does  not  change  over  long  periods  of  time,  the  probability  of  a 
transition  from  any  state  of  the  dials  to  any  other  can  be  determined  ex¬ 
perimentally  to  any  required  precision.  This  situation  is  summarized  for- 
melly  in  the  assumption  for  probabilistic  sequentiel  meushlnes  that  the  transi¬ 
tion  probabilities  are  stationary.  Symbolizing  the  usual  lever  play  of  the 
machine  by  L,  the  transition  probabilities  can  be  tabulated  in  a  matrix 
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A(L),  with  the  entry  in  the  i'th  row  and  J'th  column  (written  A(L)ij) being 
the  probability  of  a  transition  from  state  i  to  state  J  via  input  L. 

If  there  were  no  other  permissible  way  to  affect  the  rotation  of  the 
dials  than  by  a  pull  of  the  lever,  then  the  behavior  of  a  slot-machine  A 
could  be  described  as  a  finite  state  Markov  chain  with  rewards  and  transition 
matrix  A(L).  However,  sudden  small  external  shocks  during  the  rotation  of 
the  dials  can  Influence  the  state  transitions  of  the  machine.  In  order  to 
model  completely  how  such  machines  are  played,  we  can  consider  a  finite  re¬ 
peatable  set  of  such  non-standard  inputs  to  the  machine.  For  instance,  one 
such  input  might  be  described  as  the  application  of  a  kick  with  a  prescribed 
kinetic  energy  on  a  certain  spot  on  the  machine  occurring  1/5  of  a  second 
after  the  lever  is  released.  Symbolizing  this  manner  of  playing  the  machine 
by  K,  the  transition  matrix  A(k)  could  be  determined  experimentally  since 
the  input  is  repeatable.  A  finite  set  of  such  repeatable  Inputs  could  be 
defined  and  their  effects  on  the  behavior  of  the  machine  ascertained. 

To  find  out  how  strings  of  S  and  K  inputs  to  the  machine  affect  its 
operation,  it  is  sufficient  to  multiply  the  matrices  A(S)  and  A(k)  together 
in  the  order  specified  by  the  string,  e.g. ,  if  a  string  X  is  SKKSK  then  the 
transition  matrix  A(x)  is  the  product  A(s) ‘ACk) ‘ACiO  •A(s) ‘ACiO  . 

Consider  how  the  dials  of  the  machine  might  be  fovind  initially.  If 
the  dials  can  be  completely  observed,  the  initial  state  of  the  machine  is  ob¬ 
servable.  In  this  cEee,  in  the  formedlsm  the  inltleLl  state  1  is  represented 
by  a  vector  I  (or  a  1  x  8, (XX)  matrix)  with  a  1  in  the  i'th  conqponent  and 
zeros  elsewhere.  On  the  other  hand,  the  dials  may  not  be  completely  visible. 
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and  we  may  wish  to  specify  the  average  behavior  of  a  large  number  of  machines 
run  simultaneously,  or  we  may  wish  to  consider  the  average  return  from  play¬ 
ing  one  machine  only  when  it  is  left  by  other  players  on  one  of  a  set  of  pre¬ 
ferred  states.  In  any  one  of  these  cases,  I  can  be  a  stochastic  vector 
(Ii,...,I3^qqq)  where  is  the  probability  of  being  in  state  i  at  time  tQ. 

In  the  general  case,  the  next  state  probabilities  starting  with  an  ini¬ 
tial  state  vector  I  and  an  input  string  X  are  given  by  I.A(x).  Hence  the 
expected  value  of  output  of  a  machine  A  starting  with  initial  state  distribu¬ 
tion  I  and  output  vector  F  after  a  string  X  of  inputs  has  occurred  is  just 

Ea(X)  =  I*A(X)‘F 

which  is  a  bilinear  form  in  I  and  F  with  form  matrix  A(  X) .  The  vetriance  in 
output  and  other  higher  moments  can  be  defined  analogously. 

Example  0.2,  Deterministic  internal  structxare:  Chemical  production  cell 

Suppose  a  chemical  tank  A  is  divided  into  several  isolated  compartments 
Aj.,.,.,An  by  partitions  which  are  Interconnected  by  an  electronically  con¬ 
trolled  system  of  pumps  and  valves.  Suppose  that  there  is  a  finite  set  of 
controls  S  =  0,1,..., K-1  and  that  for  each  control  c  a  fixed  fraction  of  the 
chemiceJ.  in  con^artment  A^,  v^j,  is  pumped  into  compsurtment  Aj.  For  all  con¬ 
trols  c  in  E,  the  full  Influence  on  redistribution  of  liquid  in  the  tank  can 
be  described  in  a  n  x  n  matrix  A(c)  with  v^j  being  A(c)j^j.  Furthermore,  sup¬ 
pose  that  the  liquid  being  pumped  between  compartments  is  a  catalyst  which 
causes  production  of  a  desired  end  product  in  each  compartment  with  a  differ¬ 
ent  efficiency,  i.e.,  if  the  mass  fraction  of  catalyst  in  Aj^  is  and  Fj^  is 
the  efficiency  ot  then  the  output  of  end  product  is  Note  that  it 
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is  assumed  that  the  output  of  the  compartment  depends  linearly  on  the  catalyst 
present. 

The  initial  state  I  is  an  n  component  vector  with  the  i*th  component 

n 


being  the  mass  fraction  of  catalyst  in  compartment  i. 


1  since 


i=l 


the  tank  is  a  closed  system  as  far  as  the  catalyst  is  concerned.  The  dis¬ 
tribution  of  mass  fractions  of  catalyst  over  the  compartments  after  a  se¬ 
quence  of  controls  X  =  ii...ijji  is  just 

I-A(ii)....-A(ii„)  =  I*A(X)  . 

That  is,  (l-A(X))j^  is  the  mass  fraction  of  catalyst  in  compartment  i  after 

starting  with  initial  distribution  I  of  catalyst  fractions  over  compartments 

end  the  string  of  control  inputs  X  =  ii..,ini. 

The  total  end  product  from  the  tank  is  the  sum  of  the  outputs  from  each 
n 

compartment:  ^  (l'A(X))iFi  which  can  be  written  I-A(X)'F  in  matrix  nota- 

i=l 

tion.  This  expression  has  the  same  form  as  the  expectation  of  output  for  the 
probabilistic  slot-machine,  but  there  are  no  overt  probabilities  involved 
here.  The  mass  fractions  of  catalyst  play  the  same  role  as  the  probabilities 
in  the  first  example.  Ifowever,  the  output  will  still  be  written  like  an  ex¬ 
pectation  as  Ea(  X)  . 

The  totaJ.  end  product  accumulated,  Tjj,  for  the  string  of  controls  X 
from  time  tQ  to  time  t^  +  m  is  given  by  adding  the  output  from  each  substring, 
i.e. , 

~  +•  •  •+Ey^(  ilia. . 'ira) 
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1.  DETERMINING  WHETHER  A  PROBABILISTIC  SEQUENTIAL  MACHINE  IS 
EXPECTATION  EQUIVALENT  TO  A  FINITE  DETERMINISTIC  MACHINE 

1.1  THE  CONCEPT  OF  EXPECTATION  EQUIVALENCE 

In  the  two  models  discussed  in  the  introduction,  the  expected  value  of 
output,  E;^(x),  played  an  important  role  in  the  physical  interpretations.  Let 
us  repeat  the  definition  of  the  expected  value  of  output. 

Definition  1.1:  The  expected  value  of  output  for  a  probabilistic  sequential 
machine  A  is  given  by 

* 

Epjix)  =  I-A(x)-F  for  X  in  Z 

Definition  1.2:  Machines  A  and  A'  are  expectation  equivalent,  written  A  =  jA’ 
if 

* 

E^(x)  =  E^i(x)  for  all  x  in  Z 

Recall  from  example  0.2  that  Ea,(x)  was  the  actual  output  of  the  chemical 
cell  and  not  an  expectation.  Hence  the  basic  concept  of  expectation  equiva¬ 
lence  is  analogous  to  the  definition  of  behavioral  equivalence  of  Burks  for 
the  model  0.2.  However  for  example  0.1,  the  slot-machine,  expectation  equiv¬ 
alence  is  not  the  generalization  of  this  kind  of  behavioral  equivalence. 

The  concept  of  indistlnguishability  discussed  in  Chapter  3  seems  to  be  the 
appropriate  generedization  of  this  kind  of  behavioral  equivalence.  Let  us 
now  turn  to  an  example  to  show  how  proper  coding  of  the  outputs  could  meike 
the  concept  of  expected  value  of  output  relevant  to  an  unreliable  digital 
computer. 
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Example  1>1,  Proper  choice  of  output  code  can  make  the  expected  value  of 
output  relevant  to  the  study  of  real  computers.  We  encode  the  output  so 
Ey^(x)  is  approximately  the  code  for  output  for  string  x.  Then  expectation 
equivalent  machines  have  nearly  the  same  input-output  behavior  when  one 
averages  it  over  a  large  number  of  programs. 


Suppose  from 

some  machine 

A  we 

have 

IA(x) 

=  (.0000, 

.0625, 

.8750, 

.0625, 

.0000, 

...) 

iA(y) 

=  (.8750, 

.0625, 

.0000, 

.0000, 

.0625, 

...) 

IA(z) 

=  (.0000, 

.0000, 

.0000, 

.1250, 

.8750, 

...) 

T 

F 

=  (  T, 

X, 

A, 

A^, 

...) 

with  the  Intent  that 

X  causes  an  ”A"  as  output 
y  causes  a  ”T’^  as  output 
z  causes  a  as  output 

We  can  recode  the  output  symbols  by  the  following  (p^  is  recoded) 

=  (100,  Oil,  010,  001,  000,  ..•) 

and 

Ea(x)  =  OlOa  which  is  the  code  for  A 

EaCy)  =  lOOa  which  is  the  code  for  T 

Ea(z)  =  (.001)2  which  is  not  a  code,  but 

if  decoding  is  used  which  picks  the  closest  code  number,  z  is  associated  with 

output 

A  more  careful  choice  of  code  numbers  could  have  made  each  expectation 
equal  to  a  code,  simplifying  the  decoding  problem.*  However  in  a  practical 
situation,  only  a  sample  expectation  to  be  decoded  can  be  obtained  and  a  more 
elaborate  statlstlceJ.  decision  mle  than  Just  compEu-ing  for  equality  must  be 
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used  in  decoding  the  output  symbol. 


♦Proof:  Let  Xi  be  the  code  weight.  (Xi  =  Fj,)  and  l*A(zi)  =  Pi2<***^Pin) 

i  =  1,2,. ..,n 

The  condition  that  Ey^(z^)  =  X^^  i  =  l,2,...,n  implies  that 

P12X2  +  ...  +  Pin^^n  = 


PniXi  +  Pna^s  +  . . .  +  Prin^n 


Xn 


or  equivalently 


Pii-i 

P12  •••  Pin  ' 

P'X  = 

P21 

P22-1  . . •  Pan 

1 

\  Pni 

.  Pnn 

which  has  a  non- zero  solution 

iff  Detenn,(P*)  =  0 

Xn 


=  0 


By  definition  an  eigenvalue  of  a  matrix  M  is  some  nvunber  such  that 

j  mi2  •  •  •  ^in  \ 


Determ, 


=  0 


raai  m22-\i  . . ,  m2n 

mni  ...  ...  mnn“Xi| 

For  any  stochastic  matrix,  1  is  an  eigenvalue. 

Hence  Deterra. (P')  =  0  is  always  true  for  any  choice  of  probabilities 
and  the  result  follows. 

In  order  for  the  encoding  to  be  unique  we  also  need  Xi  ^  Xj,  i  ^  J  but 
conditions  on  the  probabilities  for  this  to  occur  will  not  be  considered  here. 


Example  1.2. 

Machines  A  and  A'  which  are  expectation  equivalent:  A  =  gA' 

IA(x)F  =  I'A'(x)F'  Vx  e  E* 
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A  =  <  I, 

A(  O) , 

A(  1) ,  F  >  and 

A'  =  <  I,  A'(0), 

A'(l) 

,  F  > 

A(0)  = 

1  1 

0 

0\ 

A(l)  =  /5/5 

1/5 

I/5I 

1/2 

1/4 

1/4 

1/5 

4/5 

0  , 

[l/k 

0 

3/^ 

14/5 

1/5 

0  i 

A'(0)  = 

1  ] 

0 

0  ' 

A'(l)  =  /7/IO 

0 

3/l0\ 

5/8 

0 

3/8 

13/5 

0 

2/5 

( 

,  0 

1/2 

l/2j 

,  9/10 

(7 

0 

1/10/ 

F  =  F*  = 

1 ' 

5 

i3/{ 

These  machines  are  expectation  equivalent  from  any  initial  probability  dis¬ 
tribution,  I,  over  states. 

The  previous  example  shows  that  two  machines  can  have  very  different 
switching  matrices  and  still  be  expectation  equivalent.  Frequently,  studies 
of  Markov  processes  are  concerned  with  the  location  of  the  zeros  in  the  trans¬ 
ition  matrices.  The  example  shows  that  the  locations  of  zeros  in  the  transi¬ 
tion  matrices  is  not  the  only  relevant  factor  in  the  study  of  expectation 
equivalence.  Since  the  graph  theoretic  properties  of  the  transition  matrices, 
such  as  the  accessibility  of  a  state, depend  on  where  the  zeros  are,  one 
would  not  expect  a  purely  graph  theoretic  approeush  to  be  very  fruitful  in 
the  study  of  this  problem.  Hence  some  of  the  tools  of  linear  algebra  will  be 
used  in  addition  to  the  above  approaches. 

1.2  THE  REDUCTION  RELATION  Rj, 

In  this  section  a  congruence  relation,  Rp,  will  be  defined  so  that  a 
quotient  machine  can  be  constructed.  States  of  the  quotient  machine  will 
correspond  to  the  distinct  vedues  of  expectation  which  occur  for  input  strings. 
If  the  rank  of  Rp  happens  to  be  finite,  the  machine  constructed  has  a  finite 
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number  of  states.  By  attaching  a  deterministic  output  device  to  each  state 
of  the  constructed  machine,  the  expectation  equivalent  deterministic  machine 
is  obtained. 

If  the  rank  of  Rjr  is  finite,  some  class  of  the  relation  must  contain  in¬ 
finitely  many  strings.  A  necessary  condition  for  Rp  to  be  finite  in  rank  is 
that  it  be  non-trivial,  i.e.  at  least  two  different  strings  are  contained  in 
some  class.  This  necessary  condition  produces  strong  constraints  on  the  form 
of  the  symbol  matrices  of  such  probabilistic  machines. 

Definition  1.3:  The  reduction  relation  Rp  is  given  by 

X  Rp  y  iff  E^(x)  =  E^(y)  &  E^(xz)  =  E^(yz)  Vz  €  Z  VI  €  S 

If  Z  contains  A,  a  semigroup  identity,  the  definition  reduces  to 

* 

X  Rp  y  iff  Ea(xz)  =  E;\^(yz)  z  e  Z  ,  VI  e  S  . 

Rp  is  a  congruence  relation  on  L  because  of  the  reflexivlty,  transitivity 
and  symmetry  of  '*="  and  the  substitution  property  in  its  definition. 

In  order  to  discuss  congruence  relations  between  stochastic  matrices 
which  may  not  be  generated  by  strings  of  symbol  matrices  a  matrix  congruence 
analogous  to  Rp  will  be  defined. 

Definition  1.4;  The  matrix  reduction  relation  Rjj  between  n  x  n  stochastic 
matrices  B  and  B’l 

B  B*  iff  IBF  -  IB*F’  and  there  exist  machines 
A  and  A'  such  that  IBA(z)F  =  IB’A'(z)P'  for  all  I  €  S,  for  all  z  €  E  . 

Hence  two  strings  x  and  y  which  are  in  the  same  class  of  the  relation 
Rp  will  have  equal  expectations  from  any  initial  state  of  the  machine  and 
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will  continue  to  have  equal  expectations  for  any  finite  input  continuation  z. 
As  far  as  expectation  of  output  is  concerned,  the  behavior  of  the  machine  A 
is  the  same  after  either  string  x  or  string  y.  Returning  to  Example  1.1,  we 
can  Interpret  x  and  y  as  program  segments  which  produce  the  same  final  output 
code  and  from  which  any  continuation  will  give  the  same  output  code.  If  in¬ 
termediate  outputs  are  suppressed,  x  and  y  in  Rp  can  be  regarded  as  equivalent 
microprograms  in  the  machine  A. 

1.3  CONSTRUCTION  OF  THE  QUOTIENT  MACHINE 


Definition  1.3:  The  equivalence  class  of  x'  of  R,  a  congruence  relation, is 
given  by 

R[x' ]  =  {x  :  X  R  x' ) 

It  is  a  well  known  result  [lo]  of  automata  theory  that  given  a  right 

* 

congruence  relation  R  on  E  ,  one  can  construct  a  quotient  automaton  with  no 
output  T(R) 

T(R)  =  <  a,  S,  M  > 


where 


a  =  R[a] 

S  =  {R[x]  :  X  6  Z  } 

M  is  a  function  from  S  x  Z  into  S  such  that 
M(Rtxl;cr)  =  R[xa]  x  e  Z  j  a  6  Z 


Definition  1.6;  Let  ^  CZ  .  A  congruence  R  refines  P  if 

X  R  y  X  €  p  iff  yep  . 
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Theorem  1.1  (Nerode) 


Let  3  be  a  subset  of  Z  .  3  is  the  behavior  of  a  finite  (deterministic) 

automaton  A  =  <  T(R),rX  >  over  E  where =  {R[x]  :  x  €  p]  iff  there  exists 
a  right  congruence  relation  R  of  finite  remk  which  refines  p. 

Theorem  1.2 

If  the  congruence  relation  Rp  has  finite  rank,  then  for  any  7\  there  is 
a  finite  deterministic  automaton  A'  such  that  the  tapes  accepted  by  A'  are 
T(A,A). 

Proof:  Let  3  =  T(A,A)  =  {x  :  E;^(x)  >  A],  Note  that  Rp  refines  3  i.e. 

X  Rp  y  =^x  6  T(A,A)  iff  y  e  T(A,A)  .  If  Rp  has  finite  rank,  by  definition 
{Rp[x])  has  a  finite  number  of  members.  Using  theorem  1.1  we  construct 
T(  Rp)  =  <  a,  S ,  M  >  and 

A'  =  <  a,  S,  M,  >  which  accepts  T(A,A) 

Q.L.D. 

Definition  1.7:  rp^(x)  is  the  response  of  A  ^  input  string  x.  If  A  is 
deterministic,  rp;^(x)  is  the  state  of  A  after  an  input  of  x.  If  A  is  probabil¬ 
istic,  rp^(x)  is  a  random  variable  taking  on  values  which  are  states. 

We  use  the  above  construction  to  give  a  sufficient  condition  for  the  re¬ 
duction  of  a  probabilistic  sequential  machine  into  sm  expectation  equivalent 
finite  deterministic  machine  whose  output  function  is  either  a  constant  C(s) 

R  n 

for  each  state  s  or  a  random  device  0^(s)  with  expectation  £(0^(3))  =  C(s). 
Theorem  1.3 

The  reduction  relation  Rp  defined  by  a  probabilistic  machine  A  has 
finite  rank  if  and  only  if  there  exists  a  finite  deterministic  machine  A' 
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* 

with  a  deterministic  output  0^;  such  that  0y^i(rpy^,(x))  =  Ey^(x)  yx  e  E 

Proof:  (sufficiency)  By  theorem  l.L  let  A'  =  <  a,  S,  M,  >  where  ^ 
is  the  empty  set.  Note  siny  congruence  R  refines  ^  vacuously.  We  attach  an 
output  function  0^,  to  elements  of  S. 

Oa.(s)  =  Ey^(x)  s  =  Rp[x]  . 

For  a  deterministic  machine,  M  is  extended  to  M  which  operates  on 
strings  rather  than  symbols  by 

M*(s,a)  =  M(s,a)  s  e  S  a  €  E 

*  *  *  * 

M  (s,ax)  =  M  (M  ( s,  cr)  ,x)  x  6  E 

We  note  that  M  (a,x)  =  rp^,(x)  so  we  need  show  only  that  rpy^,(x) 

=  Rptx].  Let  X  =  iii2...ini  for  ij  €  E;  j  =  1,2,. ..,m. 

rp^,(x)  =  M*(z,x)  =  M*(M*(a,li)  ,i2...iin) 

=  M*(M  (a,ii)  ,i2..  .im) 


=  M*(M(Rp[A],ii),l2...1i„) 

=  M*(Rp.[Aii],i2...ii„) 

=  M*'(M(RF[ii],l2),i3...ini) 

•  •  •  • 

=s  Hpi[ixi2*  •  •  ijn  ]  =  Rp[x] 


Hence  the  constructed  sequential  machine  is  A*  =  <  a,  S,  M,  0^,  > 
(necessity) 

Given  A*  =  <  a,  S,  M,  Oa’  >  such  that 

0A'(rpA'(x))  =  Ea(x)  yx  €  E 

0^^,(rp^,(xz))  =  E^(xz)  yz  €  E* 

Let  rp^,(x)  =  x  €  E 

Define 


Sx  Sy  iff  X  Rp  y 
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Let  n*  be  the  cardinality  of  S  —  finite. 

rank  Rq  =  remk  Rp 
rank  Rq  5 

Hence  rank  Rp  is  finite,  Q. E.D. 

R  /  s 

Instead  of  the  deterministic  function  O^i,  a  random  device  O^i(s)  such  that 

p 

E(0^,(s))  =  E  (x)  could  have  been  used  in  the  construction  . 

A  A 

l.k  THE  PARTITION  OF  THE  SET  OF  ACCESSIBLE  STATE  DISTRIBUTIONS  INDUCED  BY  Rp 

Definition  1,8:  V(A)  =  {IA(x)  :  x  e  Z  }  —  the  set  of  all  stochastic  vectors 
which  can  occur  as  distributions  over  the  states  of  A.  We  sometimes  call 
V(A)  the  "state  vectors  accessible  in  A". 


Definition  1.9?  A  set  of  vectors  V  =  {vi,V2, ...)  is 
set  of  indices  I  >  0  and  I  Ci  =  1  ciVi  6  V. 


iel 


i€l 


set  of  vectors  V,  written  =  {v'  :  v'  =^Civi  and 


convex  if  for  any  finite 
The  c onvex  closure  of  a 

^  Ci  =  1  and  ci  >  0  and 


iel  iel 


Vi  €  V}. 


Theorem  1.4 

If  Rp  has  finite  rank  r,  there  exists  a  partition  II  =  (lli, . . .  ,IIj.)  on 
V(a)  and  an  integer  valued  function  g(l,m)  such  that 

HiA(  c)  =  n[g(i,a)  ^  oeZ 

Proof;  We  use  Rp  to  Induce  an  equivalence  on  the  set  of  stochastic 
vectors  accessible  by  the  mewhlne. 

Since  Rp  Is  of  finite  rank,  we  fonn  a  set  of  an  arbitrary  distinct  rep¬ 
resentative  from  each  congruence  class,  say  Xi,...,Xr  where  Xi  /  xj 
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i  ®  J  ^  1* 

Define  IIj  =  U  {lA(x) ) 

X  €  Rptxj ] 

We  show  that  ( , ,11^)  is  a  partition  of  V(A) 
r 

Let  W  =  [J  ri^ 
i=l 

lA(x')  €  W  =5»>IA(x')  e  V(A) 
lA(x')  €  V(a)  =%>x'  e  Rp[xjj]  for  some  k  =  l,...,r 
=J^IA(x')  e  IIjj  for  some  k  =  l,...,r 
=^IA(x')  €  W 

Hence 

n 

w  =  U  Hi  =  V(A) 
i=l 

We  show 

^i  ^  ^  ^  ^  ^ 

suppose  that 

*  *  *  *  ^ 

where 


Vy  =  IA(y) 

IA(y)  e  Hi  =►  y  €  RF[xi]  ^y  Ry  xi 
IA(y)  €  rij  =^y  €  Rf[xj]  =^y  Rp  xj 

Hence  we  get 


y  Rp  Xi  =^Xi  Rp  y  by  symmetry 
and  treuisitivity  of  Rp  gives 

*i  "►J'i  6 
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But  since  and  Xj  are  representatives  and  there  is  only  one  represers- 
tative  from  each  class 

^1  =  i  J 

which  is  a  contradiction. 

Finally  we  show  there  exists  an  integer  valued  function  g(i,a)  such  that 

niA(a)  =  aeS 

* 

Vi  e  III  Vj.  =  lA(wi)  for  some  e  E 
ViA(ct)  =  IA(wi)A((j)  =  lA(wia)  e  Hj 

for  some  J  as  has  been  shown  above 

* 

V2  e  rii  V2  =  IA(w2)  for  some  W2  6  E 
V2A(a)  =  IA(w2a)  e  II j 

since  elements  of  Rp  have  the  substitution  property,  l.e. 

wi  Rp,  x^  =^wia  Rp  3^.7  a  e  E 

W2  Rp  Xi=^W2ff  Rp  Xj^a  a  e  E 

xj^a  is  an  element  of  a  class  with  representative  xj  for  some  j  and  de¬ 
pends  only  on  x^  and  a.  So  there  is  a  fijnction  g(l,m)  such  that 

g(i,c;)  =  J  0  6  E  Q.E.D. 

1.5  NECESSARY  AND  SDTFICIENT  CONDITIONS  THAT  STRINGS  BE  IN  THE  SAME  Rp  CLASS 
The  relation  Rp  hsis  occupied  an  important  place  in  the  developmem:  of 
this  theory.  We  now  study  the  structure  of  the  matrices  of  strings  which  are 
in  the  same  Rp  class. 

Definition  1.10;  A  relation  R  is  non-trivial  if  there  exist  x  eind  y  in  the 
domain  of  R  with  x  /  y  such  that  x  R  y. 
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0) 


Definition  1,11;  The  kernel  of  F  =  Kem.  (F) 

=  (v  €  :  v*F  = 

where  R  is  the  set  of  reals. 


Definition  1.12:  The  span  of  a  set  of  vectors  (vi,...,Vj.}  is  denoted  by 


<  {vi,...,vr)  >  =  j  ^  V^i  €  r] 


.1=1 

Theorem  l.$ 

A  necessary  and  sufficient  condition  for  x  and  y  to  be  in  the  same  class 

of  the  reduction  relation  Rp  is: 

(x,y)Rp<|=#»  a  a  subspace  U  of  Kern.  (F)  such  that 

* 

(i)  UA(z)  CZ  Kem.  (f)  z  €  Z 

/ 

(ii)  A(x)  =  A(y)  + 


with  Ui  e  U  1  =  l,...,n 


"n 


Proof: 


(x,y)  e  Rp  =*  IA(xz)F  =  IA(yz)F  Vl  €  S  Vz  €  Z 

hence 

A(x)F  -  A(y)F 

since 

S  =  {(1,0,...,0),...,(0,...,0,1) }  and  A  e  Z 

by  lineeu:  algebra 

Ih, 


A(x)  =  A(y)  + 
multiplying  by  A(z) 


fin 


where  hj  e  Kem.  (F),  1  =  l,2,.,,,n 
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hi  -I 

A(x)A(z)  =  A(y)A(z)  +  A(  z)  Vx  c  Z 


hi 

A(xz)  =  A(yz)  +  I  A(z) 


we  ceui  multiply  by  any  initial  state  distribution  I  so 

IA(xz)F  =  IA(yz)F  +  I  !  A(  z)  F  I  €  S 

K 

But  since  x  and  y  are  in  Rp 

IA(xz)F  =  IA(yz)F  Vz  €  Z  I  e 

Hence 

hi 

I  t  A(z)f  =  0  A(z)  6  Kern.(F),  i  =  l,2,...,n. 

K 

Let  U  =  <  hi , . . . , hn  > 

We  get  UA(z)  C  Kern.(F)  yz  6  Z 

Notation:  we  denote  by  Ai  the  i'th  row  of  the  matrix  A. 


hi 

I«t  H  =  I  where  h.  €  U  C  Kiem.(F),  i  =  l,2,...,n. 

K 


*  A(x)  =  A(y)  +  H 

multiplying  the  equality  by  I  on  the  left  and  P  on  the  right  we  obtain 

'  hi  f\  ^ 

IA(x)F  =  IA(y)F  +  I  *  !  I  6  S 

Ihn'F/ 

but  h^F  =  0  since  e  Kern.(F)  i  =  l,...,n 

IA(x)F  =  IA(y)P 
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using  *  again  and  the  same  argument  we  get 

A(xz)  =  A(yz)  +  HA(  z) 

IA(xz)F  =  IA(yz)F  +  IHA(  z)  F 
=  IA(yz)F 

since  HA(z)  =  !  where  e  Kern.(F)  Q.E.D. 

un 

We  now  simplify  the  restriction  (i)  of  Theorem  1.6,  to  symbol  matrices 
rather  than  string  matrices. 


Theorem  1.6 

Let  U  =  <  LJ  *  {A(x)  j^-A(y)  :  i  =  l,...,n  for  x,y  such  that  (x,y)  €  Rp  > 

X  €  Z 

then 

U*A(z)  C  Kern.(F)  <4^  [av  a  subspace  of  : 

(1)  UA(  i)  C  V  :  Vi  e  E 
(ii)  VA(i)  =  VCKem.(F)  Vi  €  E] 

Proof; 

UA(z)  d  Kem.(F) 

Let  V  =  <  {u*A(z) ;  u  €  U,  z  €  E  )  > 

VA(i)  =  {uA(z)A(i)j  u  6  U,  z  €  E*) 

=  V 

Consider  an  arbitreu:y  v  e  V.  There  must  be  some  set  of  indexes  J  and  con¬ 
stants  cj  such  that: 
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V 


But 


so 


Hence 


by  definition  of  V. 


jeJ 


V  F  = 


\j€J 


j€j 


UjA(zj)f  =  0  by  UA(  z)  CKern.(F) 


vF  =  0 


V Kern.(F) 

UA(  z)  V  already  shown 

.■.UA(z)  C  Kern.(F)  Q.E.D. 


Definition  1,13:  A  subspace  V  is  invariant  under  a  set  of  linear  transforma¬ 
tions 

{Ti  :  i  =  l,2,...,m)  if  V-Tj  =  V  i  =  l,2,...,m  . 

Using  Theorems  I.5  and  1.6,  we  get  the  following  directly: 

Theorem  I.7 

Strings  X  and  y  are  in  the  same  class  of  Rp  if  and  only  if  there  exists 
a  subspace  V  of  Kern.(F)  such  that 

(i)  V  is  invariant  \inder  {A(i);  yi  €  Z) 

(ii)  A(x)  =  A(y)+H  where  €  V  i  =  l,...,n 
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1.6  NECESSARY  AND  SUFFICIENT  CONDITIONS  THAT  Rp  BE  NON-TRIVIAL 

A  very  weak  necessary  condition  that  Rp  have  finite  rank  is  that  it  at 
least  be  non-trivial.  From  theorem  1.7  it  is  immediate  that: 

Theorem  1.8 

The  reduction  relation  Rp  is  non-trivial <$=^3  a  subspace  V  of  Kern.(F) 
such  that 

( i)  Vis  invariant  under  {A(  i) ;  Vi  €  E] 

(ii)  A(x)  =  A(y)+H  where  €  V  i  =  l,...,n 

(iii)  X  ^  y  . 

Hence  we  now  know  that  given  strings  x  and  y  in  Rp,  the  difference  be¬ 
tween  the  rows  of  the  matrices  A(x)  and  A(y)  must  be  elements  of  a  subspace 
V  which  has  special  properties.  Namely  V  must  be  invariant  under  all  symbol 
matrices  and  contained  in  the  kernel  of  the  output  vector. 


Theorem  1.9 

A  necessary  and  sufficient  condition  that  Rp  be  non-trivial  is  that 
A(i)  :  Vi  e  S  reducible  for  the  same  change  of  basis  to  V.  In  other  words, 
there  exists  a  linear  transformation  W  of  the  state  vectors  S  to  a  basis  for 


V  such  that 


W"^A(i)W  = 


basis  for  V 

■  i 

Ai  0 

Aa  A3 


Where  0  denotes  a  submatrix  of  zeros  end  A^,  Aa^  and  A3  are  submatrices  which 


for  all  i  in  E  have  the  same  number  of  columns  and  rows. 
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Proof!  By  theorem  1.7  and  standard  matrix  theory  [see  Jacobson,  Lectures 


in  Abstract  Algebra,  V.  II:  Lineeu:  Algebra,  Van  Nostrand,  New  York,  1952,  pp. 

116-117]. 


Consequently,  Theorem  I.9  gives  us  a  matrix  reformulation  of  the  statement 
that  Rjp  be  non-trivial. 


Example  1.5.  We  now  show  a  probabilistic  sequential  machine  A  which  illustrates 
theorems  1.3  and  1.7*  The  method  by  which  this  example  was  generated  will 
be  discussed  in  a  latter  report. 


where 


I 


A(0)  = 


A(l)  = 


A  =  <  I,  A(0),  A(l),  F> 


(8/10,  1/10,  1/10,  0,  0,  0) 


0 

1 

0 

0 

0 

0 

/10| 

0 

1 

0 

0 

0 

0 

A 

0 

0 

1/2 

0 

1/2 

0 

F  = 

1 

0 

0 

0 

0 

0 

0 

2 

0 

0 

5/^ 

0 

1/4 

\o 

0 

0 

0 

0 

ll 

2/ 

'  0 

0 

1/8 

0 

7/8 

0\ 

0 

0 

0 

1 

0 

0  1 

0 

0 

4/8 

0 

4/8 

0 

0 

0 

5/8 

0 

5/8 

0 

0 

0 

2/8 

0 

6/8 

o] 

\l 

0 

0 

0 

0 

0 

The  state  dleigram  for  A: 
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where  the  following  labeling  conventions  are  used. 

p(K)  :  p6[0,l];  K  €  S  means  probability  of  transition  of  p  via  symbol  K. 

i  ;  :  Output  of  occurs  vhen  the  machine  is  in  state 

Pi(Ki)  Pi(Ki),  PaCKa) 

0  - )  0  :  is  replaced  by  0  .  y  0 

Pb(Ks) 

We  note  that 


since : 


OORjO 
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A(00) 


10 

0 

0 

0 

0 

^0 


1 

1 

0 

0 

0 

0 


0 

0 

1/2 

0 

3/^ 

0 


0 

0 

0 

0 

0 

0 


0 

0 

1/2 

0 

1/4 

0 


o' 
0 
1 
0 
1 


0 

0 

0 

0 

0 

0 


1 

1 

0 

0 

0 

0 


0 

0 

1/2 

0 

3/4 

0 


0 

0 

0 

0 

0 

0 


0 

0 

1/2 

0 

1/4 

0 


0 

0 

0 

1 

0 

li 


which  gives 


(A(00)  -  A(0))F  =  io 

0 

0 

0 

0 

(io\ 

0 

0 

0 

0 

0 

0  1 

'  5 

h 

0 

+1/8 

0 

-1/8 

0 

1 

0 

0 

0 

0 

0 

2 

0 

0 

-3/16 

0 

3/l6 

,  1 

v 

0 

0 

0 

0 

0/ 

)  2/. 

Hence  A(00)F  =  A(0)F  or 

IA(00)F 

= 

IA(o)F  for  all  I. 

Furthermore,  for  all  P  e  [0,l] 

(o,  0,  p,  0,  i“P,  o)a(o) 


=  (0,0, 0,0, 0,0) 


(0,  0,  P,  0,  1-P,  0) 


(0,  0,  P,  0,  1-P,  O)A(l)  =  (0,  0,  P,  0,  1-P,  0) 


that  Is 


W  =  <  {(0,  0,  P,  0,  1-P,  0))  > 

Is  Invarlcuit  under  the  symbol  matrices  A(0)  euid  A(l). 

V  »  <  ((0,  0,  P,  0,  -P,  0)  >C  W  and  VA(o) 

VA(1) 


*  V 
=  V 


Hence  for  z  €  L 
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(a(oo)  "  a(o))a(z)  -  C2 


D 


'0  0  0  0  0  O', 

0  0  0  0  0  o' 

0  0  +1/8  0  -1/8  0 

0  0  0  0  0  0  ; 

0  0  -3/16  0  3/16  0  ; 

\  0  0  0  0  0  0 

where  C2  is  a  constant  depending  on  the  string  z 
and 

(A(00)  -  A(0))A(z)F  =  DF  =  0 

V*  + 

consequently  Yz  e  L  VI  e  S 

IA(00)A(z)F  =  IA(0)A(z)f 


E^(OOz)  =  E^(Oz),  which  shows  OORpO. 

By  the  same  method  one  can  show  that 

10  Rp,  1  011  Rp,  11  01011  Rp,  11  111  Rp  11  01010  Rp  0 

and  all  strings  are  in  the  classes 

Rp(A),  Rp[0],  Rp[l],  Rp[ll],  Rp[0l],  Rp[010],  Rp[010l] 

which  means  that  Rp  has  finite  rank. 

We  compute  the  expectations  and  construct  the  expectation  equivalent 
deterministic  machine  A' .  Note  that  the  values  of  expectation  depend  on 
the  initial  state  I. 

E^(A)  =  IA(a)F  =  if  =  8.6 

E^(0)  =  (0,  9/10,  1/20,  0,  1/20,  0)F  =  4.6 

Ej^(l)  =  (0,  0,  3/20,  2/20,  15/20,  0)F  =  1.1 

E;^(01)  =  (0,  0,  3/80,  72/80,  5/80,  0)F  =  1.9 

E^(IO)  =  (0,  0,  3/20,  0,  15/20,  2/20)  F  =  1.1  =  E;^(l)  (since  lORpl) 

E^jll)  =  (0,  0,  9A0,  0,  31/40,  0)F  =  1.0 

Ea(oio)  =  1.9 

Ea(oioi)  =  9.1 
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Hence  the  expectation  equilvalent  deterministic  machine  is 


We  note  that  A'  has  7  states  while  A  has  Just  6  states.  The  determinis¬ 
tic  cycle  0101  appears  in  both  machines 
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2.  DETERMINING  WHETHER  A  PROBABILISTIC  SEQUENTIAL  MACHINE  IS  N-MOMENT 
EQUIVALENT  TO  A  MACHINE  WITH  DETERMINISTIC 
SWITCHING  AND  RANDOM  OUTPUTS 

In  this  section  the  concept  of  expectation  equivalence  is  generalized  to 

N 

N-moment  equivalence.  A  congruence  relation  R  is  defined  which  partitions 

r 

the  input  strings  into  classes  whose  members  all  produce  the  same  expectation 

N 

and  first  N-1  central  moments  in  a  given  machine.  If  Rp,  has  finite  rank,  a 
finite  quotient  machine  caji  be  constructed  which  is  deterministic  with  each 
state  corresponding  to  a  congruence  class.  Each  state  can  be  connected  to  a 
random  device  having  the  same  expectation  and  N-1  moments  as  the  class  repre¬ 
sented  by  the  state,  giving  a  deterministic  machine  with  random  outputs.  The 
deterministic  machine  constructed  is  N-moment  equivalent  to  the  probabilistic 

machine.  After  the  first  theorem  concerning  a  necessary  and  sufficient  con- 

N 

dition  that  two  strings  be  in  the  same  R_  class,  it  is  obvious  that  a  simple 

r 

substitution  gives  generalizations  of  the  results  of  section  1  and  they  are 
presented  in  this  section  without  proofs. 

2.1  DISTRIBUTION  EQUIVALENCE:  Hq 

The  random  variable  structure  of  probabilistic  sequential  machines  will 
be  investigated  in  this  section. 

Definition  2.1;  =  "the  output  random  variable  of  the  machine  A  after 

a  string  x  has  occurred  eis  liqput. 

The  distribution  of  0^(x)  is  IA(x)  emd  values  of  0^(x)  are  the  entries 

of  F. 
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Definition  2.2;  A  and  A'  are  distribution  equivalent  written  A^^A'  if  for 


=  {j  :  (lA(x)jFj  ^  O)  there  is  a  1-1  map  h  between  and  J^,  such  that 
lA(x)h^j)  =  I'A'(x)j  J  e  J^,  X  €  E 

^h(j)  =  J  ®  -^A' 

Distribution  equivalence  corresponds  to  the  conventional  definition  of  equiv 
alence  for  discrete  random  variables  except  for  random  variables  ^  Fj  for 
1  j. 

Referring  back  to  example  0.2  note  that  two  chemical  cells  are  distribu 
tion  equivalent  if  ( l)  We  neglect  those  partitioned  areas  whic  h  have  either 
zero  efficiency  or  a  zero  fraction  of  the  catalyst.  (2)  Of  the  remaining 
partitioned  areas  there  is  a  correspondence  between  the  partitioned  areas  of 
one  cell  and  the  other  such  that  corresponding  areas  have  the  same  fraction 
of  catalyst  regardless  of  the  sequence  of  controls  entering  the  cells.  (3) 
Corresponding  partitioned  areas  have  the  same  efficiencies. 


.2.2  MOMENTS  OF  THE  OUTPUT  RANDOM  VARIABLE 


Definition  2.3:  Let 

F 


Fj^  €  R  i  =  1^2^...^n 


call 

Then  the  I'th  central  moment  of  Oj^(x)  Is 


50 


kij(x)  =  E^[(0*(x)-E^(x))^]  i=2,3,... 


Sometimes  "  Ey^(x)  will  be  used  informally. 


Theorem  2,1 

^i(x) 


^  (j^)(-l)^IA(x)(F'^"^E^(x)^  1  =  2,3,... 

k=0 


Proof;  By  the  binominal  theorem 

i 


x)  = 


k=0 


i  “  k 

To  compute  the  expectation  of  the  discrete  random  variable  0^(x)  note 

*,  .  i-k  i-k 

it  has  the  same  distribution  as  0^(x)  but  takes  on  values  ,...,Fn  for 

i  ^  k 

i^l 

H*(x)  =  ^  .  (-l)\(x)‘ 

k=0 


i-1 

V 

L> 

k=0 


k  i  i-k  k  i  i 

-1)  (j^)  •  IA(x)(F  )E^(x)  +  (-1)  E^(x)  Q.E.D. 


2.5  SPECIAL  PROPERTIES  OF  RABIN  PROBABILISTIC  AUTOMATA 

Definition  2,4:  A  Rabin  probabilistic  automaton  [4]  is  a  probabilistic  se¬ 
quential  machine  such  that  F^  =  0  or  F^  =  1  i  =  l,2,...n. 

We  now  observe  that  Rabin  probabilistic  automata  have  rather  special 
features  as  far  as  the  random  variable  of  the  output  is  concerned. 
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Corrollary  2.1;  For  a  Rabin  probabilistic  automaton  A 


i-1 

k=0 

Proof :  =  0  or  1  hence 

(F^"^)  =  F  for  i  ^  k 

and  the  result  from  Theorem  2.1. 

Corrollary  2,2:  If  E^(x)  =  E^(y)  for  some  Rabin  probabilistic  automaton  A, 

then  all  central  moments  for  x  and  y  are  equal  also,  i.e. 

A  A 

hi(x)  =  m(y)  for  i  =  2,5,... 

Note :  for  1=2  we  get  the  variances  of  the  outputs  are  equal. 

Corrollary  2.5:  If  two  Rabin  probabilistic  automaton  A  and  A'  are  expectation 
equivalent  then 

A  A'  * 

M'j^(x)  —  (x)  i  =  2,5,.  ••  Vx  €  iC  . 

2.4  THE  CONCEPT  OF  N-MOMENT  EQUIVALENCE: 

Even  if  two  machines  are  expectation  equivalent,  the  statistics  of  their 
behavior  may  be  so  different  that  for  many  purposes  we  would  not  want  to  con¬ 
sider  the  machines  behaviorally  equivalent.  Returning  to  example  0.1,  two 
®lot— machines  can  be  expectation  equlvsCLent,  meaning  that  the  average  payoff 
is  the  same  for  both,  but  one  can  be  much  more  desirable  than  the  other  for 
a  player  of  limited  resources.  For  a  player  with  limited  resources  might  have 
a  far  longer  average  time  until  "gambler's  ruin"  on  one  machine  rather  than 
the  other.  Hence  in  order  to  lump  those  machines  in  the  same  class  whose 
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statistics  of  behavior  are  scmevhat  alike,  we  Introduce  the  notion  of  N- 
moment  equivalence. 


Definition  2,3;  Probabilistic  sequential  machines  A  and  A'  sure  N -moment 
equivalent,  written  if 

Ey^(x)  =  E^,(x) 

p,^(x)  =  M'^'(x)  1  =  2,...,N  for  all  x  In  E* 

2.5  THE  RELATIONSHIP  BETWEEN  AND  Sjj 


Theorem  2.2 

For  probabilistic  sequential  machines  A  sind  A' 

A  =j)  A'  A  sjj  A'  for  all  finite  N 
Proof:  Distribution  equivalence  means  there  exists  an  h  such  that 

^h(l)  = 

(lA(x))j^^^j  =  (l'A'(x))j  VX  €  E* 

when 

(l'A'(x))iFi  ^  0  . 

Hence 

n  n 

i-1  i“l 

or 

E^(x)  «  E^,(x) 

which  is  ejqpectatlon  equivalence.  For  any  finite  N 

-  (Pi)" 
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The  fact  that 


4m  -  .*'(») 

comes  from  inspection  of  Theorem  2.1.  Symbolically,  we  have  shown 
A  =jj  A'  A  ^  A'  for  any  N.  How  close  one  can  come  to  a  converse  to 
Theorem  2.2  depends  on  the  form  of  the  entires  of  F. 

Lemma  2.1  ( Gantmacher  [ll]) 

Given  a  sequence  S©;  Si,...  of  real  numbers  S,  if  one  determines  positive 
numbers  >  0,  rg  >  0, . . .  ,rn  >  0 

“  >  Vm  >  Vm.i,...,  Vi  >  0 
such  that  the  following  equations  hold 

m 

(*)  ~  ^  ~ 

J=1 

then  the  solution  to  (*)  is  unique.  We  can  apply  the  lemma  to  get  the 
following  partial  converse. 

Theorem  2.3  If  machines  A  and  A'  meet  the  following  requirements  (Letting 
h(i)  =  i  W.L.G.) 

(i)  (lA(x))^F^  =  0  iff  (l'A'(x))^F'  =  0  i  =  l,2,...,n. 

(ii)  All  states  in  a  given  machine  have  distinct  output  symbols 

(lii)  E^(x)  =  Ey^,(x)  vx  6  S* 

A  A’ 

^j^(x)  “  (x)  i  =  2,^,... 

Then  A  and  A'  axe  distribution  equivalent. 

Proof:  We  use  Lemma  2.1 

Since  the  central  moments  and  ejected  values  of  output  are  equal  for 

#  # 

any  string,  the  moments  of  0a(x)  and  OjV'(x)  about  zero  are  equal  for  any  string. 
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Sq  =  Ei[(lA(x))i  such  that  Fj,  ^  o] 

Si  =  E^(x)  =  E^,(x) 

A  2  A'  2 

®2  =  k^aCx)  +  E^(x)  =  |i2  (x)  +  Ej^,(x) 

« 

We  discard  those  components  whose  contribution  to  the  moment  is  zero  and  re¬ 
label  the  non-zero  components  by  the  index  j.  Let 

J  =  {i  :  IA(x)j^F^  ^  0) 

Because  of  assumption  (i)  we  also  have 

j  +  {1  :  I'A(x)^Fj_  ^  0) 

Hence 

Sp  =  ^  (lA(x))^(F^)^  P  =  0,1,2,... 

JeJ 

=  ^  (I'A'(x))j(F-)^  P  =  0,1,2,... 

JeJ 

By  the  lemma  the  solution  is  unique. 

(IA(x))j  =  (l'A'(x))j  J  €  J 

Fj  =  F-  JeJ 

Hence  A  and  A'  are  distribution  equivalent. 

Example  2.1 

The  condition  ( il)  of  theorem  2. 3  is  a  necessary  condition  as  shown  by 


IA(x) 

II 

• 

I'A'(x) 

=  (.5,  .4,  .1) 
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the  following: 


Ea(x)  =  IA(x)F  =  .5 

Ea.(x)  =  I'A'(x)F'  =  .5 

Since  A  and  A'  are  Rabin  automata,  by  Corollary  2.5 

A  A 

H^(x)  =  kLj^(x)  i  =  2,5,... 

However,  A  and  A'  have  different  distributions  over  states  for  the  string  x. 

2.6  THE  N-REDUCTION  RELATION 

N  N 

Definition  2.5:  The  N-reduction  relation  Rp:  xR^y  if  for  all  I  in  S 

[E^(x)  =  E^(y)  and  u^(x)  =  li^(y)  i  =  2,5,...,N] 

A  A  ri* 

=#  [E^(xz)  =  E^(yz)  and  n^(xz)  =  Vz  €  Z  ,  i  =  2,5,...,N] 

N  1 

The  relation  Ey  is  a  congruence  relation  and  Rp  =  Rp,  Elements  in  the 

N 

same  congruence  class  of  Rp  have  expectations  and  the  first  N-1  central 

moments  equal.  Hence  the  machine  E  mod  Rp  can  have  random  devices  attached 

N 

to  the  states  (which  are  Rp[x])  such  that  the  first  N-1  central  moments  and 
expectation  of  each  device  is  the  same  as  the  congruence  class  represented 
by  the  state.  The  resulting  machine  has  deterministic  switching  and  random 
output  functions  and  is  equivalent  by  Sjj  to  the  probabilistic  machine  defin¬ 
ing  R^. 

Theorem  2.h 

The  N-reduction  relation  is  non-trivial  iff  there  are  strings  x  euid  y: 

hi  N 

A(x)  *  A(y)  +  where  <  (hi,...,hjj)  >  C  Kern.(F^) 

fi„  i=l 

N 

and  <  (hi,,.,,hn)  >  A(a)C  O  Kem.(F^) 

i=:l 
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N 

Proof ;  Suppose  that  Rp  is  non-trivial. 

Ea(5c)  =  EA(y)  X  ^  y 

■  ■■  IA(x)F  =  IA(y)F  VI  €  S 

ki| 

<$#>  A(x)  =  A(y)  + 


€  Kern.(F) 
i  =  1,2, . . .n 

nj(x)  =  IA(x)(l^)  -  Ea(x)^ 


=  IA(y)(F^) 

IA(x)(F)^  =  IA(y)(F^) 


E^Cy)^ 

VI  €  s 


^  A(x)  =  A(y)  + 


6  Kern.(F®) 


For  any  1,  |i^(x)  can  be  written  as  a  recursive  function  of  IA(x)(F^)  and 
smaller  powers  of  F,  i.e.. 


i-1 


Ui(x) 


IA(x)(F^)  +  ^  (-1)^Z)IA(x)(F^'^Ea(x)^  +  (-l)\(x)^ 


k=l 


Ifence  by  induction  we  assume 

IA(x)(F^)  =  IA(y)(F^)  k  =  1,2, ...i-1;  VI 

Ifence 

Hi(x)  =  IA(x)(F^)  +  P 
lijCy)  =  IA(y)(F^)  +  P 
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Hi(x)  =  ^^(y)<-$>IA(x)(F^)  =  IA(y)(F^)  VI 


<=3>A(x)  =  A(y)  + 


where  r-  e  Kem.(F  ) 


The  rest  of  the  proof  is  analogous  to  Theorem  1.5*  Q. E.D. 

N  ^ 

If  we  substitute  Rp  for  Rp  and  Kern.(F^)  for  Kem.(F)  the  proofs  of 

i=l 

Theorems  l.ij-,  1.6,  1.8,  and  1.9  go  through  exactly  as  before  and  we  state  the 
dual  theorems  which  are  obtained. 


Theorem  1.4d 

N  /  \ 

If  Rp  has  finite  rank  r  there  exists  a  partition  n  =  ( jti, . . . on 

V(A)  and  an  integer  valued  function  y(l,m)  such  that  n j^A(  a)  = 

i  =  1,2, . . . ,r  a  6  E. 

Theorem  1.6d 

I  I  N 

Let  U  =  <  I  *  {A(x)i-A(y)i)  i  =  1,2,..., n  >  for  (x,y)  6  Rp  then 

xeE 

for  any  z  e  E* 

^  i 

UA(z)<  ,  :  Kern.(F  )<=^-  -here  exists  V 

i=i 

a  subspace  of  R*^  such  that  for  any  i  €  E 
(i)  UA(i)C  V 

N 

(il)  VA(i)  =  VCO  Kem.(F^) 
i=l 
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Theorem  1.8D 


Rp  is  non-trivial  =  (av)  a  subspace  such  that 
N 

(i)  VCO  Kern.(F^) 

1=1 

( ii)  V  is  invariant  under  {A( i)  ) 

i  €  E 

(iii)  A(x)  =  A(y)  +  H  where  %  e  V 

some  ^  0. 


Example  2.2 

We  extend  Example  l.J  to  Illustrate  theorems  1.4D  and  1.8D, 


N 


<  {(0,  0,  p,  0,  -p,  O)  Kern 

n=l 


5 

I'" 

2n 

an/ 


for  any  finite  N. 

Hence  we  can  replace  the  output  from  any  state  with  a  rsmdom  device 

with  the  same  first  N  central  moments  as  the  probabilistic  sequential  machine. 

By  way  of  illustration,  we  compute  the  variances.  Note  that  here  the  classes 

,N 


of  Rp  are  also  the  classes  of  Rp 


H2(A)  »  (8/10,  l/lO,  l/lO,  0,  0,  0) 


100  \  -  (8.6)‘ 
25 
1 
4 

\  ii 


=  8.84 
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Likewise,  we  get 


^2(0) 

U2(l) 

H§(01) 

H2(10) 

M■2(  11) 


^I2(  0101) 


1.41; 


.09 

.09 

.09 

(0,  0,  9/40,  0,  31/40,  0) 


100  1 
25 
1 
4 
1 

4; 


-  (i.o)‘ 


0.0 


(0,  0,  21/320,  0,  11/320,  72/80) 


100' 

25 

1 

4 

1 

4 


-  3.61 


0.0 


(72/80,  0,  55/1280,  0,  75/1280,  0) 


|100\ 

25 

1 

4 

1 

4 


-  (9.1)' 


7.29 


Hence  a  machine  A'  which  has  the  same  expectation  and  variance  for  each 
string  can  be  constructed  with  deterministic  switching  and  reindom  output  de¬ 
vices  symbolized  by 

S  ;  (la 

attached  to  states  S  which  supply  random  numbers  with  mean  e  and  varlEuice  N. 
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The  machine  A'  is  then  Just  the  machine  of  example  1,3  with  the  outputs 


connected  to  devices  such  as  the  above. 


where 


m 


is  the  initial  state  of  A' 


Fig.  2.1  Machine  A'  which  has  the  same  expectation  and  vsurieuice  for  all 
strings  as  probabilistic  machine  A  of  Example  1.5. 


Example  2.5.  Probabilistic  sequential  machines  A  and  A'  such  that 

Ea(x)  =  Ea'(x)  . 


A  A*  ri* 

H^(x)  =  ti^(x)  vx  6  E  i=2,5,... 


A(0)  = 

1 

0 

0  \ 

[3/5 

1/5 

l/5\ 

1/2 

1/4 

1/4 

1  A(l)  = 

1/5 

4/5 

0 

[l/4 

0 

5/^1 

i  V5 

1/5 

0  ' 
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A'(0)  = 

1  0 

0 

1 1 

1  V5 

1/5 

0  \ 

1  ° 

1/4 

3/4  j 

A'(l)  =  1 

1  ^ 

4/5 

1/5 

io 

0 

1  / 

1  V5 

1/5 

0  / 

For  both  machines 


F  = 


Fa  I  for  Fi,  Fg  arbitrary  real  numbers 

Fi, 


Note  that  Rp  is  non-trivial  since  there  Is  an  invariant  subspace 

U  =  <  {(1,  0,  -1)  }  >  . 


such  that 


[A(0)-A'(0)  ]j  €  U 
[A(l)-A'(l)  ]j  €  U  j  =  l,2,...n. 


Theorem  1.9D 

Rp  is  non-trivial<=^the  symbol  matrices  A(i)  :i  €  Z  be  reducible  for 
the  same  change  of  basis  (f)r  v)  i.e.  a  a  linear  transformation  W  from  the 
state  basis  S  to  a  basis  for  V  such  that 


W^A(i)W 


V 


where  0  denotes  a  block  of  all  zeros  the  same  size  for  all  symbols  i 
H 

and  VC  0  Kern.(F^). 
i=l 
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5.  THE  NOTION  OF  INDISTINGUISHABILITY  AS  A  CRITERION 
OF  BEHAVIORAL  EQUIVALENCE 


If  probabilistic  sequential  machines  A  and  A^  are  behaviorally  equivalent 
in  an  intuitive  sense,  taking  into  consideration  how  machines  are  built  and 
repaired,  one  would  expect  them  to  be  interchangeable  as  a  submachine  of  any 
larger  machine.  Indistinguishability  of  two  machines  in  any  machine  in  which 
they  can  be  plugged  into  is  a  strong  criterion,  the  ramifications  of  which  we 
shall  investigate.  The  following  example  [9]  illustrates  how  the  notion  of 
equivalence  through  accepting  the  same  set  of  tapes,  fails  to  meet  this 
indistinguishability  requirement. 


5.1  EXAMPLE  OF  TWO  DISTRIBUTION  EQUIVALENT  MACHINES  WHICH  PERFDRM  DIFFERENTLY 
AS  COMPONENTS  OF  A  MACHINE 


where 


Ai  =  <  Ii,  Ai(0),  Aid),  Fi  > 


Ai(0)  =  Ai(l)  = 


lO  1/2  1/2  0  0\ 

0  0  0  1  0 

0  0  0  0  1 

0  0  0  0  1 

0  0  0  0  1/ 


Fi  = 


Ii  =  (1,  0,  0,  0,  0) 


where  Ig  =  li,  Fs  = 


As 

Fi 


=  (I2,  Ae(0),  Ae(l),  Fg) 
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A2(0)  =  A2(1) 


0  1/2  1/2  0  0  ' 
0  0  0  1/2  1/2 

0  0  0  1/2  1/2 

0  0  0  0  1 

0  0  0  0  II 


Note  that  machines  Ai  and  kz  happen  to  be  independent  of  the  input  as  Ai(o) 
=  Ai(l)  and  AaCo)  =  A2(l)  and  hence  are  both  markov  processes. 


TABLE  5.1 

COMPARISON  OF  MACHINES  Ai  AND  Aa 


X 

EA,(xj 

IiAi(x) 

EAa(  x) 

l2Aa(x) 

A 

0 

(1, 

0,  0,  0,  0) 

0 

(1, 

0,  0,  0,  0) 

0  or  1 

1/2 

(0, 

1/2,  1/2,  0,  0) 

1/2 

(0, 

1/2,  1/2,  0,  0) 

00,  01, 

,  10  or  11 

1/2 

(0, 

0,  0,  1/2,  1/2) 

1/2 

(0, 

0,  0,  1/2,  1/2) 

all  x; 

^g(x)  ^  5 

0 

(0, 

O 

o 

o 

0 

(0, 

O 

o 

o 

From  the  above  table  we  see  that  Ai  and  Aa  are  distribution  equivalent 
as  well  as  expectation  equivalent.  We  later  will  show  the  existence  of  a  ma¬ 
chine  which  behaves  differently  with  Ai  and  Aa  as  subraachines  despite  the  fant 
that  the  state  behaviors  of  Ai  and  Aa  are  Markov  processes. 

Definition  5.1.  A  -►  B  denotes  the  machine  obtained  from  plugging  the  outputs 
of  A  into  the  inputs  of  B,  subject  to  the  provision  that  the  inputs  of  B  be 
compatible  with  the  outputs  of  A. 

Definition  5.2.  A  and  A'  are  tape  equlveLlent  machines,  written  A  Sj  A'  if  for 
some  specified 
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T(A,\i)  =  T(ASXc) 


Definition  3«3«  A  and  A'  are  tape  Indistinguishable  for  a  class  C  of  ma¬ 
chines  if 

T(A-*>C  =  T(A'-K;,xJ 

for  all  \  and  C  €  C  . 

We  may  sometimes  let  C  be  a  larger  class  than  finite  deterministic  or 
probabilistic  automata. 

Theorem  ^.1 

If  probabilistic  sequential  machines  A  and  A'  are  distribution  equivalent 
they  are  not  necessarily  tape -indistinguishable  for  the  class  of  finite  de- 
teiministic  automata. 

Proof  (by  example) :  Let  C  be  a  finite  deterministic  machine  which  ac¬ 
cepts  01,  10  with  probability  1  and  all  other  types  with  probability  0.  We 
tabulate  the  expectation  of  Ai  C  and  Ag  -*■  C  in  Table  5.2. 

TABLE  3.2 


EXPECTATION  OF  Ai  -*■  C  AND  Aa  -►  C 


X 

^Ai-k; 

®  As-KJ 

00 

0 

1/lf 

01 

1/2 

1/4 

10 

1/2 

1/4 

11 

0 

1/4 

V5 


Hence  T(Ai-^C,\)  ^  T(A2>C,\)  for  any  \  e  (l/2,  o) .  The  reason  for  this  dif¬ 
ference  is  because  the  conditional  probabilities  of  output  random  variables 
differ  for  Ai  and  Ag.  For  example, 

Prob.  {0*  (01)  =  1)  =  1  given  0*  (l)  =  0  . 

Ai  As 

While 

Prob.  {0*  (01)  =  1)  =  1/2  given  0*  (l)  =  0  . 

Aa  kz 

Theorem  3*2 

For  probabilistic  sequential  machines  A  sund  A*  if  for  all  finite  de¬ 
terministic  machines  C  and  any  outpoint 

T(A■>C,^)  =  T(A'-k:,\) 

=>  A  Sg  A' 

Proof:  Suppose  E^(x)  /  E^i(x)  for  some  tape  x  of  length  k.  Without 
loss  of  generality  pick  E^(x)  >  E^i(x).  Let  be  a  rational  such  that 
E^(x)  >  >  E^,(x)  .  Let  C  be  a  deterministic  machine  which  beginning  at 

time  k  computes  the  number  where  ij^  is  the  input  at  time  k.  Since 

is  rational  C  needs  only  a  finite  number  of  states.  C  accepts  the  string  x 
ff  iij-\;  >  0,  which  can  be  done  in  a  finite  number  of  steps. 

X  e  T(B>C,\^,)  iff  E3^(x)  > 
but  since  C  is  deterministic 

X  €  T(B*C,\,)  iff  Eb(x)  > 
hence  let  B  -  A  and  B  -  A' : 

X  6  T(A-K3,  \;)  and  x  e  T(A'-K;,  Ac) 
so 

t(a-k;,\c)  /  T(A’>C,\c) 
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By  logical  equlveLLence  we  have  shown  for  the  class  C  of  finite  determin¬ 
istic  machines 

(\)(C)  [T(A-^C,\)  =  T(A'>C,X)  ]  (x)  [Ea(x)  =  E^,(x)  ] 

Q.E.D. 

By  the  example  presented  in  Theorem  3.1  we  know  the  converse  is  not  true. 

5.2  A  M)RE  SATISFACTORY  TECHNICAL  NOTION  OF  INDISTINGUISHABILITY 

The  example  at  the  beginning  of  this  section  shows  that  notions  of  ma¬ 
chine  equivalence  such  as  ^  equivalence  and  even  distribution  equivalence, 

Sjij,  break  down  under  composition  of  machines. 

In  order  to  get  a  more  satisfactory  definition  of  behavioral  equivalence, 
the  conditional  probability  structure  of  probabilistic  sequential  machines 
will  be  explored.  A  stronger  concept  of  equivalence,  called  indistinguishabil 
ity,  based  upon  equality  for  the  two  machines  of  the  probabilities  of  all  pos¬ 
sible  output  strings  given  all  possible  input  strings  will  be  formulated. 
Following  the  development  of  Carlyle  [6],  a  bound  will  be  found  for  the  length 

of  strings  needed  for  deciding  whether  two  machines  are  indistinguishable. 

* 

In  what  follows  it  is  assumed  that  L  contains  a  string  of  one  symbol  A 
so  that  A(a)  =  En  the  matrix  identity. 

Definition  3»4«  The  conditional  probability  for  a  sequence  of  outputs  y 
=  yxy2...yni  given  a  string  of  inputs  x  =  crx.  ..c%n  starting  from  an  initial  dis¬ 
tribution  n  =  (IIx,ri2,...,IIn)  in  a  machine  A  will  be  written 

or  if  the  machine  involved  is  clear  from  context,  just  Pii(y/x). 
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We  note  that  the  symbols  of  the  output  alphabet  are  real  numbers  which 


occur  as  components  of  the  output  column  vector  F,  l.e.  the  output  alphabet 
Y  can  be  written 


Y 


U 


i=l 


Definition  3.^.  The  probability  of  a  sequence  of  transitions 
with  output  sequence  y  because  of  input  sequence  x  will  be  written 

Pc 


■Sii 


Si  (y/x)  . 

J 


Definition  3,6.  The  conditional  probability  transition  matrix  A(yi/a)  is 
formed  from  A(a)  by  zeroing  out  all  columns  except  those  corresponding  to 
states  with  output  yi.  More  formally, 

Let 

J  =  {J  :  Fj  =  Yi)  Yi  €  Y 

Yi 

Yi  Yi  yi. 

and  let  Q  be  the  matrix  with  [Q  ].  ,  =  1  for  J  e  J  and  [Q 

Yi 

yj 

othen/ise.  Then  A(y^/a)  =  A(a)Q  yi  6  Y,  a  e  E.  Note  that  [ACyjj/cr)  is 
is  Just  Psi->Sj^yk/‘^^  • 


Remark  5.6:  Let  y  6  Y*,  x  €  Z*,  yi  €  Y,  a  €  S  such  that  ig(y)  =  ig(x). 
Then 

A(yyi/xa)  =  A(y/x)A(yi/a)  . 

By  definition  [A(yyi/xo)  is  Ps^->Sn,(YYiAa) 


For  any  state 


'  Sk-^' 


since  transitions  to  different  states  are  mutually  exclusive  events, 


k=l 


using  the  definitions  again 


n 


[A(yyi/xa)  ]_g  ^  =  /  [A(y/x) 

9  *  /  I  9  *  *"9 


rn 


k=l 


or  in  matrix  form 

A(yyi/xa)  =  A(y/x)  ACy^/a)  . 

Hence  the  conditional  probability  transition  matrices  for  strings  can  be  gen¬ 
erated  by  the  conditional  probability  transition  matrices  for  symbols  as  was 
the  case  for  the  transition  matrices  A(x). 

Remark  3>7:  Given  initial  distribution  over  states  the  probability  of  get¬ 
ting  output  string  y  from  input  string  x  is  just 

n  n 

I  I  njA(jrA)l,^^ 

J=1  i=l 

=  nA(y/x)S  . 

identity 


Pj^(y/x)  = 


with  S  =1  :  I  we  can  write 

W 

Pn^y/x) 

Remark  3.8;  We  note  the  following 
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since 


(y/x) 


yicY 


for  all  a  €  Z 


\  Pjl^yy  \  nA(y/x)A(y^/a)S 

yi€Y  yisY 


=  riA(y/x) 


)  A(y./ff)S 

YieY 


IIA(y/x)A(a)S  . 


But  for  any  n  x  n  stochastic  row  matrix  C 


CS  =  S 


Hence 

riA(y/x)A(0)S  =  IIA(y/x)S  =  Ppj(y/x) 


Definition  3«7-  The  terminal  distribution  II*(y/x)  for  a  sequence  of  outputs 
y  given  inputs  x 


ii^'Cy/x) 


IIA( y/x) 
IIA(y/x)S 


The  i'th  component  of  II*(y/x)  is  the  probability  of  being  in  state  i  after 
input  string  x  has  occurred  and  output  string  y  has  been  observed. 

The  following  identity  holds  whenever  P^i(y/x)  >  0. 


i’ii(yyiAo')  =  (yA)Pii*(y/x)(yi/'^) 


Yi  e  y.  a  e 


Definition  ^,8:  Machines  A  and  A'  are  indistinguishable  written  A  =t  A'  if 
Pjj(y/x)  =  Pj^,(y/x)  vx  e  Z*,  Vy  e  Y* 

Hence  our  concept  of  indistinguishabiiity  for  machines  depends  on  observable 
identity  when  both  machines  are  started  from  their  initial  state  distributions. 
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Definition  3«9!  Machines  A  and  A'  are  k-lndlstlngulshable  If 


A  A'  mm 

Pri(y/x)  =  Pj.j,(y/x)  X  6  (Z)  ,  ye  (y)  for  m  =  0,1,... ,k. 

Definition  $.10;  In  a  machine  A,  two  initial  state  distributions  11  and  \  are 

ind i st ingui shable  if 

A  A  * 

Pjj(y/x)  =  F^(y/x)  vy  €  y*,  vx  €  Z 

Definition  ^.11:  In  a  machine  A,  two  initial  state  distributions  II  and  \  are 
ind i st ingui shable  if 

P^(y/x)  =  P^(y/x) 

yx  such  that 

ig(x)  <  k, 

yy  such  that 

^g(y)  =  ■«s(x)  . 

Checking  whether  the  Indlstlngulshablllty  definition  (3-8)  for  machines 
or  for  Initial  distributions  (3.IO)  holds  using  only  the  definitions  Involves 
calculation  of  an  unbounded  sequence  of  conditional  probabilities.  In  the 
next  section  Is  shown  a  bound  for  the  length  of  strings  whose  probabilities 
need  to  be  calculated.  As  In  the  deterministic  machine  case.  If  n  Is  the 
number  of  states,  then  only  strings  of  length  n-2  or  less  need  be  considered 
In  establishing  Indlstlngulshablllty. 

3.3  THE  RELATIONSHIP  BETWEEN  THE  INTUITIVE  AND  TECHNICAL  CONCEPTS  OF 
INDISTINGUIS HABILITY 

We  have  yet  to  relate  the  Intuitive  notion  of  Indlstlngulshablllty  to 
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the  technical  definition  5, 8,  In  the  next  theorem  will  be  shown  that  two  ma¬ 
chines  indistinguishable  in  the  technical  sense  are  indeed  indistinguishable 
when  plugged  into  C,  any  finite  state  probabilistic  or  deterministic  machine. 
Since  C  has  a  finite  number  of  states,  it  is  assumed  that  finite  strings  of 
Z  =  C(Y*),  the  random  variable  taking  on  values  of  strings  of  outputs  of  C 

given  strings  of  inputs  fron  the  random  variable  Y,  depend  only  on  finite 
* 

strings  Y  . 

Theorem  3.4 

Let  C*  be  the  class  of  finite  state  probabilistic  and  deterministic 
sequential  machines.  For  any  C  €  C* 

Z  =  C(  Y*)  /x)  =  (  Z  =  C(  Y* , )  /x) 

if 

A  A' 

for  Y^  and  Y^,  having  the  same  range  e  Y 

Proof :  For  any  fixed  value  of  the  output  string  random  variable  of 
A,  Ya 

P^(z  =  C(y^)/x)  =  P^(y^/x)P^(z  =  C(y^)/y^) 

since  the  occurence  of  different  y^^  are  disjoint  events,  for  all  y^^  €  Y  : 
HiVp)  =  ■«g(x). 

A-^C  V  ^  A  C 

Pn  =  c{yf)/x)  =  )  =  ^(^a^/^a^ 

yji€(Y)  ‘ef*) 

Likewise 
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Pjl  %'  =  C(y^,)/x)  =  ^  P^,(y^,/x)P  (z>  =  C(y^,)/y^,) 

y^  g(Y)^g(x) 

But  since  Z  and  A'  and  and  range  over  the  same  sets  respectively,  and 
the  indistinguishability  of  A  and  A',  i.e. 

^11^ ^A  “  ^a/^^  ^  ^n'^^A'  ^  yA'/^^ 

we  get 

A^C  A'->C 

PlI  (z  =  C(Y^)/x)  =  Pjj,  (z'  =  C(Y;^,)/x) 

which  means  A-^C  and  A^-^C  are  indistinguishable.  Q.E.D. 

Since  the  machine  C  might  ignore  its  inputs^  it  is  clear  that  the  converse 

to  Theorem  5-^  does  not  hold. 

Hence  the  criterion  of  indistinguishability  as  a  submachine  has  lead 
us  to  the  technical  definition  5-8  as  a  kind  of  behavior  equivalence. 
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k.  FINITE  COMPLETE  SETS  OF  INVARIANTS  FOR  THE  BEHAVIORAL 
EQUIVALENCES  =e,  AND  =i  AND  THE  REDUCTION 
CONGRUENCE  RELATIONS  Rp  AND  Rp 

The  results  of  the  previous  sections  involve  relations  defined  over  all 
finite  strings  of  the  input  alphabet.  In  this  section  are  found  bounds  for 
the  length  of  strings  necessary  to  consider  in  order  to  decide  whether  two 
elements  of  the  domains  of  the  relations  are  in  the  same  class. 


Definition  4.1.  A  set  of  functions  fi,..,,fjjj  is  a  complete  set  of  Invariants 

for  the  relation  R  if  for  all  x  and  y  in  the  domain  of  R 

xRy<^q(x)  =  f^(y)  i  =  l,...,ra 

We  now  show  sets  of  functions  which  are  invariants  for  the  above  rela- 

N 

tions.  A  set  of  functions  which  are  invariant  over  Rp  and  Rp  are: 


for  all  z:  ig(z)  <  i,  for  all  I  €  S 


While  for  the  relation  the  set  of  functions  below  is  a  set  of  invariants: 

A 

"  PjjCyA)  f’or  all  x  and  y:  ig(x)  =  ig(y)  <  1 
Likewise  the  set 


h(x,l)(A)  =  ^ 

^(x,r)^^  =  Hr(x)  for  r  =  2,...,N 

Is  a  set  of  Invariants  for  the  relations  and 

It  Is  clear  that  for  an  unbounded  i,  the  above  are  complete  sets  of  In- 
varleints.  However,  In  what  follows  a  finite  value  of  1  will  be  found  for 
each  of  these  cases.  In  the  case  of  ajj  the  bound  will  be  the  same  as  the 
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well  known  Moore  bound  for  deterministic  automata  but  in  the  case  of  it 
will  be  lower  for  most  machines.  The  main  tool  used  in  finding  the  various 
values  of  i  is  the  following  simple  lemma. 

k.l  THE  FUNDAMENTAL  LEMMA 


Lemma  4 , 1 

Given  an  n-dimensional  vector  space  V,  a  finite  set  T  =  {T^}  where  each 
Tj_  €  V  X  V  is  a  linear  transformation  on  V  and  some  finite  set  of  vectors 
VqC  V  such  that  dim  <Vo>  =  r  >  1. 

Define 

Mo  =  Vo 

Ml  =  ^  ■'''o  ^  Vq] 

Mk  =  {Vo*Ti^...Ti^  :  €  T,  Vp  €  Vq) 

and  let 

Li  =  <(U 

^J=0 

Then  there  exists  an  integer  J(T)  such  that  for  any  €  V^;  ^  0 

(i)  =  Lj(t)+1 

(Z 

(ii)  ^  for  k  4=j(t) 

(ill)  J(T)  <  n-r 


Proof;  C  Li  C. . . .  (_  Li  C . . .  C.  Lj^  as  a  consequence  of  the  definition. 
00 

The  sequence  {dim  Lj}j_Q  is  bounded  above  by  n,  the  dimension  of  V.  Call 

J(T)  the  smallest  index  k  such  that  Lk+i  =  Lk*  Showing  that  the  sequence 
J(T) 

(dim  Lj)  is  strictly  increasing  requires  that  for  all  J+1  <  J(T) 
j“0 
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^J+1  ^  ^J+2  “'^J+1  ^ 
which  is  logically  equivalent  to 

^J+1  "  ^^i+2  =  ^J+1 

Hence  it  is  sufficient  to  show 


Assume 


■'J+l' 


‘^J+2  f"-  Lj+i 


W.L.  G.  pick 

V  =  ®  ^j+2  “  ' ’^^J+1^ ''^^J+2 

But 


V  =  Vo-Ti^...Ti.^^  €  Lj+i 

So  there  is  a  finite  set  of  indices  I  =  {1}  of  a  spanning  set  U  for  Lj 

U  =  ^^o-Tgi-Tg^...Tgi^  :  i  €  I] 
such  that  <  i  and  constants  c^ 


so 


=  w.Tij^2  “  ®  ^J+1 


Bi**  ^J+1^ 


i.e. 


Lj+2  ^  Lj+i 


Now  consider  the  sequence  of  dimensions 

dim  Iq,  dim  Lx,..., dim  Lj(T) 

since 

Ljf  ^  Ljj+i  for  k+1  <  J(T),  dim  Ljt  <  dim  Ljt+i  for  k+1  <  j(T) 
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Noting  that 


dim 


which  gives 


r,  dim  +  J(T)  <  dim  ^ 


J(T)  <  n  -  r 

k.2  A  BOUND  FOR  TESTING  FOR  MEMBERSHIP  IN  Sj 


Q«  E*  D« 


Theorem  h.l 

If  A  is  a  probabilistic  sequential  machine  with  n  states,  then  (n-l)- 
indistinguishability  of  initial  distributions  n  and  ir’  is  sufficient  to 
guarantee  indistinguishability  of  initial  distributions  ir  and  jr'. 

Proof;  Using  lemma  4.1  let 

and  dim  <Vq>  =  1 

T  =  {A(yi/a)  :  e  Y,  a  e  E} 

Vo-Ti  =  A(yi/a)S 

by  the  lemma. 

For  any  string  x  =  for  r'  finite,  A(y/x)S  can  be  expressed  as 


with 


A(y/x)S 


i*) 


r ,  <  n  -  1  for  i  e  I,  y  .  e  Y, 

Bk 

Hence  for  initial  distributions  ir  and  jt' 


a  .  e  E  (for  k  =  l,...,r  ) 
Jk 
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PTT(yA)  =  IIA(y/x)S  =  )  c.IIA(y  ,...y  .  /a  ....a  ^  )S 

Let 

i  A  ^ 

y  =  y  i***y  1  ^  =  a  ^^...a 

B],  Ji 

A  \'"'  A  i  i  i  i 

P^^(y/x)  =  2^  A  )  with  ig(y  )  =  ig(x  )  <  n  -  1 

iei 

multiplying  ( *)  hy  jt'  gives 

Pj.j,(y/x)  =  ^  ^  c^P^^,(y^/x^) 

iel 

By  the  assumption  of  (n-l) -indistinguishability  for  jt  and  n' 

A  ,  i  ,  i.  A  ,  i  ,  i.  ,  j  .  »  /  iv 

Pn^y  ^  "  ^n'(y  )  ^g(x^)  =  ^g(y  )  <  n  -  i 

Hence 

Pj^(y/x)  =  Pj^,(y/x)  Q.  E»D# 

4^5  EQUIVALENCE  OF  DISTRIBUTIONS  IN  ONE  MACHINE 

Using  Lemma  4.1,  we  can  now  meJae  effective  the  definition  of  the  rela- 
N 

tions  Rpi  and  Rp  of  Section  2.  A  bound  will  be  obtained  for  the  lengths  of 
strings  necessary  to  consider  to  decide  whether  x  and  y  are  in  the  same  con¬ 
gruence  class. 

Definition  4.2;  Distributions  n  and  \  are  equivalent  for  a  machine  A, 
written  n  ~  if  «  A(x)f  *  \A(x)F  x  c  E 

Definition  It. 3;  Distributions  jt  and  \  are  K-equlvalent  for  a  machine  A, 
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written  «  ~  if 
A 

nA(x)F  =  \A(x)F  X  e  Z^!  0  <  ig(x)  <  K 


Theorem  k,2 

If  A  is  a  probabilistic  sequential  machine  with  n  states  and  if  jt  and 

\  are  (n-l) -equivalent  in  A  then  ^ 

* 

Proof;  Let  x  be  in  Z  and  let  us  use  Lemma  4.1  with 
Vq  =  {F]  and  dim  <Vo>  =  1  T  =  {A(  a)  :  a  €  Z)  Vq'Ti  =  A(i)*F 
Hence  there  is  a  finite  set  of  vectors  A(x^)F  for  i  €  I  with  ig(x^)  <  n  -  1 
such  that 


Hence 


A(x)F  =  ^  c^A(x^)F  ^g(x^)  <n  -  1 

i€l 

m(x)F  =  ^  c^nA(x^)F  \A(x)F  =  ^  c^\A(x^)F 


( n-l) -equivalence  gives 


So 


l€l 


nA(x^)F  =  \(A(x^)F  i  €  I 


IIA(x)F  =  \A(x)F 


Q*  F*  F* 


1^.4  BOUNDS  FOR  TESTING  FOR  MEMBERSHIP  IN  AND  Rp 


Definition  4.4!  The  abstract  Join  of  probabilistic  sequential  machines 
A  =  <  rt,A(0) , . .  .A(k-l)  ,F  >  with  n  states  and  A'  =  <\,A'(o) , . .  .A'(k-l)  ,.F'> 

A 

with  n’  states  is  the  abstract  n+n*  state  machine  A  written 

A»A'  =  <  ,A®(0),...A®(k-l),P®  > 
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where 


A®(i) 


and 


jt  and  \  can  be  embedded  in  the  n+n^  dimensional  space  as 

n'  zeroes  n  zeroes 

n®  =  {it,  =  (o7TT?7^,  \)  . 

The  problem  of  deciding  whether  two  machines  A  eind  A*  are  expectation  equivalent, 


i.e. 


:rA(x)F  =  \A*(x)F  Vx  €  E* 

A  A 

is  logically  equivalent  to  deciding  when  jt  and  \  are  equivalent  in  A^A’, 
i.e.  whether 

7t  .  ^ \  • 

A® A' 


Hence  following  Caryle  [6],  we  use  Theo^’em  4.2  to  state 
Remark  4.1: 


A»A' 


e  n+n'-l  ft 
”  A»A'  ^ 


which  gives  the  following  theorem. 


Theorem  4.3 

Let  A  and  A’  be  probabilistic  sequential  machines  having  n  and  n*  states 
respectively. 

Then  a  necessary  and  sufficient  condition  that  A  and  A*  are  expectation 
equivalent : 

rffA(z)F  =  \A’(z)F*  Vz  €  Z*]  [jtA(x) F  =  \A*(x)F’  yfx:  ig(x)  <  n+n*-l]  . 
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Theorem  4.5  makes  the  experimental  determination  of  expectation  equiva¬ 


lence  possible  provided  the  number  of  states  of  both  machines  are  known. 
Furthermore,  it  gives  a  bound  on  the  process  of  finding  whether  two  strings 
are  in  the  same  equivalence  class  under  the  reduction  relation  Rp  of  Chapter  1, 
This  result  is  summarized  in  the  following  theorem. 


Theorem  4.4 

Strings  x  and  y  are  in  the  same  equivalence  class  under  the  reduction 
relation  Rp  of  an  n  state  probabilistic  sequential  machine  A^=^Ey^(xz)  =  E^(yz) 
for  all  strings  z:  ig(z)  <  n-1  and  all  I  e  S. 

Proof; 

* 

xRpy(^=>E^(xz)  =  E^(yz)  for  all  z  €  Z.  ,  for  all  I  €  S 
<=>IA(x)  A(  z)  F  =  IA(y)A(z)F 
Let  n  =  IA(x)  and  \  =  IA(y) 

* 

<^=>7tA(z)F  =  \A(z)F  Vz  €  Z 
By  Theorem  4.2  and  its  obvious  converse,  we  get 
<=>IA(x)  IA(y) 
which  gives  the  theorem. 


N 

4.5  BOUNDS  FOR  TESTING  FDR  MEMBERSHIP  IN  AND  Rj, 


Definition  4.5?  Op  =  the  independence  number  of  an  n  state  machine  A  with 
output  vector  P. 

np  =  dim  <  {(P^)  :  i  =  l,2,...,n)  > 

It  follows  from  vector  space  arguments  that 
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np,  =  #'  {Fjj  :  Fu  0) 

The  independence  number  is  Just  the  dimension  of  the  space  generated  by 
powers  of  the  components  of  the  output  vector  F.  For  a  Rabin  automata  njr  =  1 
and  all  central  moments  reduce  to  polynominals  in  what  we  may  consider  the 
first  "central  moment"  E;^(x)  .  In  general,  if  the  independence  number  is 
np,  then  for  all  x  in  E  ,  the  (np+l)  '  st  central  moment  reduces  to  a 

polynomial  in  the  lower  central  moments  since 

+  Q(x) 

where  Q(x)  is  a  polynomial  in  which  IA(x)(F^),  i  =  l,.,.,np  occur.  Hence 

c^(F^)  +  Q(x) 

since  np  is  the  dimension  of  the  space  <(F^)  :  i  =  l,2,...,n)> 

=  E  CiIA(x)(F^)  +  Q(x) 
i=l 


np+l 


=  IA(x) 


i=l 


Theorem 

Let  A  be  a  probabilistic  sequential  machine  with  output  vector  F  and  n 
states.  Then  for  any  r  <  np  sind  strings  x  and  y  in  E*: 


E^(xz)  =  E^(yz) 

E^(xz')  =  E^(yz')  "1 

S  ^i2(xz)  =  iisiyz) 

>  c 

H2(xz')  =  |i2(yz’)  J 

Vz  €  Z 

^  VZ'  :  ig(z')  <  n  -  r 

jip(xz)  i 

jir(xz')  =  ^r(y2')  J 

Proof:  Using  Lemma  4.1  with 

Vo  =  {F,(F^),...,(F^)) 
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dim  <Vo>  =  r  <  np 
(Ti)  =  {A(l)  :  1  e  E) 


for  any  e  <Vq> 


Vo*Ti  =  A(l)vo  =  CkA(i)(p^) 

Consider  any  string 


k=l 


z;  ig( z)  =  m'  finite  . 

Then  there  exists  a  spanning  set  A(x^)vo  with  i  e  I  and  constants  Ci(vo)  so 
that 

A(z)vo  =  ^  Ci(vo)A(x^)vo  :  £gix^)  <n  -  r 

i6l 

Let  Vq  range  over  the  (F^)  i  =  l,2,...,r.  For  any  jt  and  \  there  are  constant 
functions  depending  on  (F^),  Ci((F^)),  such  that 

rtA(z)(F^)  =  ^  Ci((F^))rtA(x^)(F^) 

iel 

\A(z)(F^)  =  ^  Ci((Fi))\A(xi)(Fi) 

iel 

Hence  the  moments  about  zero  from  n  and  \  are  equal  if  they  are  equal  for  all 
strings  of  length  <  n-r.  Let  ir  »  IA(x)  and  \  =  IA(y).  Then  we  have  for  aiiy 
z  and  any  Initial  distribution 

~  IA(yz)(F^)  i  =  1^2,...  ^r 
holds  if  and  only  if  for  i  =  l,2,...,r 

IA(xz'  )(F^)  =  IA(yz')(Fi) 

for  all  strings  z  of  length  less  than  or  equal  to  n-r.  Noting  by  Theorem  2.1 
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that  any  central  moment  n^(x)  is  a  function  of  IA(x)  (F)  , . . . ,  IA(x)  (p®)  the 
result  is  established.  Q.E.D. 

Corollary  k,3  (Bound  for  the  relation  Rp  to  hold) 

Let  A  be  a  probabilistic  sequential  machine  with  n  states  and  with 
N  <  np.  Then  xRpy^=^for  all  strings  z'  :  ig(z')  <  n-N 

Ej^(xz')  =  EA(yz') 

J|i2(xz')  =  U2(yz')^  for  all  I  e  S 


li^(xz')  =  u^(yz') 


Theorem  k,6 

Let  A  and  A'  be  probabilistic  sequential  machines  having  n  and  n*  states 
respectively.  Then  for  all 

r  <  np  +  Up,  -  ^  {y"  :  ^  €  yOy*  and  /  0} 
and  for  any  initial  distributions  n  in  A  and  \  in  A^  then 


Ea(x) 

II 

_ 1 

E^(x')  =  Ej^,(x') 

hsCx) 

A*  *  i  ' 

=  (x)  vx  €  Z  p»<=><; 

jua(x')  =  U2  (x')  vx' 

=  ! 

*1f: 

X 

II 

X 

Proof;  Construct  A  =  A®A'  and  let  in  Lemma  4.1  be 

^o  = 

dim  <Vo^ 

J 


=  or  ^Y')  and  ^  ^  0  and  “J^YnY' ) 

=  np+np,-#{^‘9teYnY'  and  ^  O) 

Using  Lemma  4.1  and  an  argument  like  the  one  in  Theorem  4.4  establishes  the 
theorem.  Q.E.D. 
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J^.6  DISCUSSION  OF  THE  GENERALIZATION  OF  THE  MOORE  BOUND 


Corollary  k.S 

Let  A  and  A'  be  n-state  deterministic  machines  with  two-valued  output 

alphabet  Y  =  Y'  =  {1,2).  Then  A  and  A'  are  indistinguishable  for  all  strings 

if  they  are  indistinguishable  for  all  strings  of  length  at  most  2n-2. 

Proof:  In  Theorem  h.6  we  have  np  =  2+2-2  =  2  so  that  r  <  2.  For 

deterministic  machines,  indistinguishability  reduces  to  E^(x)  =  E^,(x)  for 
* 

all  X  €  Zj  and  also 

E^(x)  =  E;^i(x)  =^^i2(x)  =  [is  (x) 

Hence  the  right  side  of  Theorem  4.6  gives  the  result.  Q.E.D. 

Theorem  4.6  can  be  regarded  as  a  generalization  of  the  Moore  result  [7] 
to  probabilistic  machines  with  arbitrary  rather  than  binary  output  alphabets. 
Note  that  Moore's  bound  is  2n-l  since  he  considers  the  initial  output  as 
part  of  the  experiment.  We  consider  the  initial  outputs  when  considering 
strings  of  length  1  since  the  ssnubol  A  has  Identity  symbol  matrix. 

The  role  of  the  zero  output  symbol  in  Theorem  4.6  is  a  significant  de¬ 
parture  from  Moore's  deterministic  results.  In  order  to  get  the  same  result 
as  Moore  in  Corollary  4.6  it  was  necessary  to  pick  a  two-valued  output  set 
{1,2)  rather  than  (0,1)  with  the  implicit  assumption  that  such  recoding  of 
output  symbols  cannot  affect  indistinguishability  between  deterministic  ma¬ 
chines.  Without  the  recoding,  r  =  1  and  the  bound  is  one  higher  than  the 
Moore  bound. 

However,  in  the  probabilistic  case,  a  different  bound  for  machines  with 
a  zero  output  symbol  than  those  with  nonzero  symbols  seems  reasonable.  A 
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zero  annihilating  some  probabilities  in  the  expectation  and  higher  moments 
can  mask  significant  changes  in  distributions.  It  is  clear  from  Theorems  1.8 
and  1.8D  that  changes  in  Fi  from  zero  to  nonzero  can  affect  the  kernel  of  F, 
perhaps  to  the  extreme  of  making  Rp  of  infinite  rather  than  finite  rank  and 
preventing  the  construction  of  an  N-moment  equivalent  finite  machine  with 
deterministic  switching. 
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The  concept  of  probabilistic  sequential  machines  (PSM),  a  generali¬ 
zation  of  Rabin's  concept  of  probabilistic  automata,  is  defined.  Such  di¬ 
verse  devices  as  unreliable  digital  computers,  slot  machines,  and  chemical 
cells  are  presented  as  examples  of  PSM.  Using  the  examples  as  motivation, 
various  kinds  of  equivalences  between  machines  are  discussed.  The  funda¬ 
mental  question  of  when  a  PSM  is  equivalent  in  some  sense  to  a  determinis¬ 
tic  machine,  perhaps  with  random  devices  attached  to  output  states,  is  con¬ 
sidered.  Finally  various  tests  involving  finitely  many  random  variables 
are  devised  for  each  of  the  kinds  of  equivalences  between  PSM  and  for  re¬ 
duction,  if  possible,  to  deterministic  machines.  One  of  the  tests  is  a 
further  generalization  of  the  Moore  bound  for  deterministic  machines  than 
has  previously  appeared  in  the  literature.  (U) 
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